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Abstract — Structure, functionality, parameters and orga- 
nization of the computing Grid in Poland is described, 
mainly from the perspective of high-energy particle physics 
community, currently its largest consumer and developer. 
It represents distributed Tier-2 in the worldwide Grid 
infrastructure. It also provides services and resources for 
data-intensive applications in other sciences. 
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1. Introduction 

Distributed computing was stimulated by human en- 
deavours in the "big science" domain and, at the same 
time, by hopes of industry to commercialize develop- 
ments in networking technologies. Some original ideas 
of organizing computing for science, as e.g. pioneering 
SETI@home [1], initiated by D. Gedye for search for 
signals of extraterrestial civilizations, eventually evolved 
into coordinated networks supporting research projects of 
unprecedented scale, as e.g. the Large Hadron Collider 
(LHC^) Computing Grid (LCG) in particle physics [2]. 
Nowadays, the Worlwide LHC Computing Grid (WLCG) 
is running on the infrastructure provided by the Eu- 
ropean initiative Enabling Grids for E-science (EGEE) 
[3], encompassing European national and regional Grids, 
coupled to the American Open Science Grid (OSG) and 
collaborating Grid centres in the Asia-Pacific region (cf. 
Fig. 1). This infrastructure is shared with a number of 
smaller projects. 

A. The Large-scale and Local Grid Architecture and 
Middleware 

Overall WLCG computing architecture is based on the 
hierarchical multi-tier model developed by MONARC 
Collaboration [4], as given in Fig. 2. The top Tier-0 

^The LHC is a collider of protons at the total operational centre- 
of-mass energy of 14 TeV, located at the European Organization for 
Nuclear Research (CERN) near Geneva, Switzerland. 
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Fig. 1. The map and basic data on the WLCG infrastructure. 
Collaborating sites are indicated as green points. Basic data for resources 
are given in the frame. The map is published on the LCG project pages 
[2]. 



is responsible for storage of raw data coming from the 
experiment Data Acquisition System, its first off-line 
processing and distribution of data over Tier-l's. All data 
are copied to Tier-1 centres in order to speedup access 
during processing and ensure storage redundancy. Data 
reprocessing and higher-level reconstructions of real and 
simulated data are foreseen to be performed at Tier-1 
and Tier-2 levels. Data and physics analyses are normally 
relegated to Tier-2 and -3 centres, closer to end-users. 
Tier-2s are powerful enough not only to support local 
needs but also to complement higher tiers with computing 
power and storage for more specialized purposes. Tier-2s 
are not required to provide with massive tape storage. 
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Fig. 2. Multitier tree-like LCG MONARC model. The DAQ stands for 
experiment's Data Acquisition System and TO-3 for Tier-0-3 levels of 
data processing. Required disc storage in petabytes (PB) and CPU in 
Mega Speclnt2000 (MSI2k), and connectivities between levels in bytes 
per second (bps) are indicated. 



Based on the middleware gLite-3.0 [5], WLCG builds 
environment with the physical resource layer hidden be- 
hind core Grid services (cf. Fig. 3). Essential services, e.g. 
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Fig. 3. Functional layers in Grid architecture. Physical resources and 
services in Local Area Networks (LANs) are available on the Wide 
Area Network (WAN), together with core and gateway services. End- 
users may access both the services and resources via dedicated portals 
ensuring easier workflow design. 

file and metadata catalogues, replica location, application 
resource catalogues or workflow engines, are either avail- 
able directly for users or support other, complex services, 
e.g. Grid monitoring is used by resource brokers for 
process management. Intelligent scheduling and resource 
brokering are now combined in a complex Workload 
Management service. This system comprises a set of Grid 
middleware components responsible for the distribution 
and management of tasks across the Grid. 



Information, monitoring and logging are available 
through the Relational Grid Monitoring Architecture (R- 
GMA) service, being an implementation of the GMA 
standard. Fig. 4 presents counting of numbers of CEs and 
numbers of jobs monitored during one year, as provided 
by the R-GMA Monitoring Service. Users interact with R- 
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Fig. 4. Records of numbers of CEs (upper) and jobs (lower) monitored 
over a year. Drops are seen during Easter in April, summer holidays in 
August and Christmass in December. 

GMA through AppHcation Programming Interfaces avail- 
able for high-level programming languages. 

Crucial for Grid management is the Distributed Grid 
Accounting System (DGAS). Its purpose is to imple- 
ment resource usage metering, accounting and account 
balancing in a fully distributed Grid environment. The last 
function, i.e. account balancing, is still not in use, because 
the Grid has not yet come into commercial phase. 

Large amounts of data are handled with DCache system 
[8]. Terabytes of data are distributed over many disc 
storage nodes but the name space is uniquely represented 
within a single file system tree. The system has shown 
to significantly improve the efficiency of connected tape 
storage systems, through caching, optimizing buffers and 
scheduled staging techniques. Furthermore, it optimizes 
the throughput to and from data clients as well as smooth- 
ing the load of the connected disc storage nodes by 
dynamically replicating datasets. 

According to recent trend towards Service Oriented 
Architecture (SOA), Grid components are reengineered as 
Web- services and published on the net. Dedicated portals 
are used for designing complex workflows within SOA. 

Users are organized in Virtual Organizations (VOs) and 
managed within VOs through the Virtual Organization 
Membership Service (VOMS). 

Each site is designed in a way typical for GLOBUS- 
operated [6] Grids (cf. Fig. 5). It contains the Computing 
Element (CE), normally consisting of the front-end ma- 
chine playing gatekeeping role and a set of Worker Nodes 
(WNs). The Storage Element (SE) is a main data container 
in a site and usually consists of disc matrices managed 
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Fig. 5. Minimal site structure in computing Grid based on GLOBUS 
Toolkit. The CE and SE are on public IP numbers and can be accessed 
directly from WAN 



by dedicated machine. The User Interfaces (UIs), enabUng 
user access to the infrastructure, may be either located on 
the spot or installed remotely. 

B. Connectivity 

The Grids of EGEE and WLCG are built on top of 
the GEANT network [7] - a collaboration of 26 national 
and research networks in Europe, led by the DANTE 
Company^. GEANT Project aims to deliver a quality- 
of-service gigabit speed backbone network for research 
in Europe. The GEANT connectivity scheme is given 
in Fig. 6. GEANT has 12 Gbps connectivity to North 




GEANT is operated by DANTE on behalf of Europe's research ard education networks 



Fig. 6. GEANT connectivity scheme in Europe with band widths, 
indicated in colours, between national access points. The map is being 
updated on the GEANT project pages [7]. 

America and 2.5 Gbps to Japan and to Trans-Eurasia In- 

^DANTE is an acronym for Delivery of Advanced Network Technol- 
ogy to Europe Limited, located at Cambridge, England. 



formation Network (TREIN2), thus ensuring collaboration 
of EGEE with OSG, Japaneese National Research Grid 
Initiative (NAREGI) and Asian-Pacific Grids. The entry 
point to Polish National Regional Network has bandwidth 
of 10 Gbps with 4470 B maximum transition unit on the 
switch. 

C. Computing Models of Principal End-users 

Four LHC experiments: ALICE [9], ATLAS [10], CMS 
[11] and LHCB [12], represent the largest consumers 
of resources on the Grid. Raw data (RAW) coming 
from real experiment's DAQ or Monte Carlo simulation 
are recorded and processed off-line by reconstruction 
programs giving Event Summary Data (ESD), programs 
extracting physical variables and providing with Analysis 
Object Data (AOD), and to further data reduction, selec- 
tion and filtering, leading to Event Tags (TAG). In addi- 
tion, derived streams of filtered data at the ESD and AOD 
levels and specialized data for detector alignment and 
calibration are recorded and analysed. Grid is also used 
for the quasi on-line processing of data used for on-line 
calibration and filtering of data in the framework of the 
Interactive European Grid (lEG) project [13]. Summary 
of data-flow parameters and requirements for off-line 
resources is given in Tab. I. Such features of computing 

TABLE I 

Data flow and resource requirements for LHC 

EXPERIMENTS 





ALICE 


ATLAS 


CMS 


LHCB 


event rate (Hz) 


50 


100 


100 


200 


Byte flow (MB/s) 


1250 


100 


100 


25 


CPU (kSI2k) 


30 


23 


40 


13 


Storage (PB/y) 


25 


17 


30 


7 



models as e.g. data flows, number of computing passes 
at each level, data redundancy, interactions of streams 
etc., differ between experiments according to specifics of 
physics processes and detectors to observe them. Impor- 
tant differences in implementations of those models result 
in many experiment- specific tools and services used by 
each experimental group. 

II. Polish Tier-2 Infrastructure 

Polish physics groups are involved in four LHC 
experiments and in other high-rate experiments using 
the Grid, e.g. COMPASS [14] at CERN and ZEUS 
[15] at Deutsches Elektronen- Synchrotron (DESY). These 
groups are mostly affiliated at Cracow and Warsaw high- 
energy physics laboratories. 

A. Tier-2 - Tier-1 Connectivity 

Large part of computing resources is located in these 
two cities and, in addition, in Poznan, with no physics 
groups but where the operator of Polish backbone comput- 
ing network PIONIER resides. These computing centres 
constitute Polish distributed Tier-2 connected to the Tier- 
1 centre at Forschungszentrum Karlsruhe (FZK) in Ger- 
many. The PIONIER network interconnects Polish Tier-2 
computing centres with a dedicated bandwidth of 1 Gbps 



and provides a bandwidth- splitting DWDM interface to 
the 10 Gbps backbone Deutsche Forschungsnetz (DFN) 
(cf. Fig. 7). PoHsh Tier-2 centres and FZK constitute a 
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Fig. 7. Internal connectivity scheme for Polish distributed Tier-2 
based on high-performance computing centres at Interdisciplinary Centre 
for Mathematical and Computational Modelling of Warsaw University 
(ICM), Cracow Academic Computing Centre of the Academy of Min- 
ing and Metallurgy (CYFRONET) and Poznan Supercomputing and 
Networking Centre of the Institute of Bioorganic Chemistry of Polish 
Academy of Sciences (PSNC). Polish national network is connected to 
German national network via wave- splitting DWDM multiplexer situated 
in Slubice. 

Virtual Local Area Network (VLAN) with the address 
pool 212.191.227.xxx. 

B. Computing Infrastructure at Polish Tier-2 Centres 

Computing resources provided by Polish centres to 
Tier-2 are summarized in Tab. II. The clusters are not ho- 

TABLE II 

Physical computing resources provided by three Tier-2 
COMPUTING Centres in Poland 



site name 


CPU available 


Storage on SE (TB) 


AMD64.PSNC.pl 


222 


4.3 


CYFRONET-IA64 


34 


0.3 


CYFRONET-LCG2 


274 


21.3 


egee.man.poznan.pl 


132 


5.2 


WARSAW-EGEE 


224 


5.9 


Total 


886 


37 



mogenous and different computing platforms and server 
hardware solutions are used. As for the CPU, the 64- 
bit processor architecture prevails. For example, at ICM 
the CE is based on AMD Opteron 250 processors as- 
sembled in Sun Fire v20 and v40 servers. Rack-mounted 
WNs and StorEdge SE are interconnected with routable 
Nortel Baystack 5510-48T and Nortel Baystack 425-24T 
switches. An automated IPMI tool was developed for 
efficient cluster management [17]. 

Just before start-up of the LHC Collider, Polish Tier- 
2 resources amount to almost 900 CPU and 37 TB of 
disc space on storage elements. This represents 2.5 % 
and 0.3% of the total resources being 36,000 CPUs and 
13.5 PB storage. Similar figures for many Tier-2s are still 



below requirements mandatory at the LHC running time. 
Depending on the experiment and its computing model, 
the CPU and disc storage are expected to be higher a 
couple of times and an order of magnitude, respectively. 
Resource doubling every year, i.e. faster than the Moore's 
Law, is planned during LHC operation time (cf. Fig. 8). 
Suitable investment for Polish Tier-2 is underway. 
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Fig. 8. 
Tier-2. 



Foreseen growth of CPU (purple) and disc storage (blue) in 



Each site on the Grid is permanently monitored by 
Grid Operational Centre and its status, both physical and 
functional, is made available on the Academia Sinica Web 
Host [16]. Example plots showing actual numbers of CPU 
usage, numbers of jobs and disc storage usage are given in 
Fig. 9. More detailed insight into disc storage distribution 
over VOs is displayed in Fig. 10. 

C. VO Support and Resource Sharing 

Tab. Ill shows resourse sharing over virtual organiza- 
tions in Polish Tier-2 clusters. Besides already mentioned 
VOs, one finds VOs related to the Baltic Grid project 
(BALTGRID), biology (BIOMED), chemistry (COM- 
PCHEM), internal EGEE development VO (DTEAM), the 
EU-China Grid initiative (EUCHINA), Central European 
Federation VO (VOCE) and a couple of minor VOs. 
In this report we do not distinguish between EGEE 
VOs, official global VOs, official local VOs and others, 
although these distinctions are important from managerial 
viewpoint. 

Inspection of the table reveals differences in local poli- 
cies of resource allocation to VOs. There is no yet official 
regulation for these policies and resource allocations are 
usually negotiated between user communities and site 
managements. In order to ensure optimal CPU usage, fair 
share system is normally implemented in queues, unless 
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Fig. 9. Monitoring plots provided by Grid Index Information Service 
tool. In the upper panel, total number of Working Nodes' CPUs available 
for users is shown in red and free CPUs in blue. The middle panel 
presents numbers of runnng (green) and waiting (blue) jobs. Disk storage 
on Storage Elements available (green) and used (blue). Temporary 
connectivity drop is seen as a dip in two upper figures. 



given VO uses privately funded machines. In future, 
the quaUty-of-service system allowing reservations and 
hiring is foreseen for both computational resources and 
communication bandwidth. These issues are related to 
future commercialization of the Grid. 

III. Operations of Polish Tier-2 
A. Daily Operations and User Support 

European Grid is supposed to provide a permanent 
and reliable infrastructure for research and science. The 
hardware is run by staff of participating institutions and is 
under local responsibility. Both central and local services 
are run by dedicated groups of people and are shared 
between partners, depending on their competence, size 
and needs of regional scientific groups. The case of 
large LHC collaborations, consisting even of thousands 
of researchers, somewhat violates this scheme. While 
the lowest-order Tier-3 nodes are traditionally situated in 
scientific institutes, Tier-2's and Tier-l's are often run by 
large regional or national computing centres, capable of 
fulfilling operational requirements. 

Daily operations are monitored by Grid Operational 
Centre (GOG) located at UK [18]. The GOG is responsi- 
ble for coordinating the overall operation of the Grid. It 
acts as a central point of operational information such as 
configuration information and contact details. The GOG 
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Fig. 10. Usage of disc storage by VOs at ICM (upper) and CYFRONET 
(lower). 



has responsibility for monitoring the operation of the 
Grid Infrastructure as a whole, devising and managing 
mechanisms and procedures which encourage optimal 
operation of the Grid, and working with Local Support 
Groups to assist them in providing the best possible 
service while their equipment is connected to the Grid. 

Basic functionality of services is regularly tested and 
monitored using Site Functional Tests (SFTs). The SFT 
uses a small test job that runs at each site and determines 
the availability of the main Grid functions. Similarly, 
the Grid Status Monitor retrieves information published 
by each site about its status. Their use and subsequent 
triggering of follow-up action is supervised by the Core 
Infrastructure Centre (CIC) on Duty staff raising opera- 
tional tickets against sites to resolve observed problems. 

Two-level ticketing system is incorporated. Ticket flow 
diagram is displayed in Fig. 11. 

Global Grid User Support (GGUS) portal run by FZK 
is a principal entry point for all sorts of tickets [19]. 
Daily ticket operations, including initial recognition of 
the type of the problem, opening and assigning ticket 
to supporters, directing it to Local Support units, care 
about timely solving and contact with users, is a duty of 
Ticket Processing Management Group (TPM). The TPM 
works in a shift system, 5 days a week, 8 hours a day. 
SFT failure tickets are send to GGUS and redirected 
automatically to Local Support units and from there to 
the front-line supporters at sites. Another type tickets are 
issued by end-users who may encounter many kinds of 
problems with a system or with applications. User tickets, 
depending on the type of the problem, are either solved 
by dedicated group of supporters asked by TPM from 
GGUS, or redirected to Local Support units and solved 
there. 

There is at least one Local Support unit in Federation. 
For Polish Tier-2, ticketing tool [20] is running using One- 
or-zero portal software [21]. 
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Virtual Organizations supported by Polish Tier-2 centres 
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IV. Interactive Grid Infrastructure 

Bulk of applications in experimental particle physics 
and other sciences needs a high-throughput batch process- 
ing of large amounts of data. For a number of applications, 
however, often interaction with intermediate results or 
fast response to a well defined computational problem is 
desired. This sort of applications, called (quasi-) interac- 
tive, draws an attention of Grid community since almost 
beginning. In Poland, involvement in deploying such 
applications on the Grid and building appropriate tools 
and infrastructure for them, dates back to the CrossGrid 
Project [22] and is nowadays continued in the framework 
of lEG project [13]. 

The lEG resources are split into two separated infras- 
tructures: 

• the production infrastructure aimed at providing 
computing and storage resources for the end-users 
running scientific applications, 

• the development infrastructure being fully indepen- 
dent of the production and aimed at supporting 



the Project software development, the test of new 
middleware and its rollout process. As such this 
infrastructure does not provide a service as stable 
and reliable as the production. Development sites 
may also be occasionally reconfigured with specific 
setups to evaluate or validate software components. 

Currently, the production infrastructure provides with 300 
CPU cores and 8 TB disc storage, located in 8 computing 
centres in Europe, with a considerable contribution of 
three Polish computing centres. 

The lEG supports the following interactive applica- 
tions: 

• Ultra Sound Computer Tomography. 

• Medical Applications on Brain Images 

• Flood Forecasting application. This application was 
first deployed on the CrossGrid testbed. 

• Visualisation of Baltic Wave Model. 

• Evolution of pollution clouds in the atmosphere. 
This application was first deployed on the CrossGrid 
testbed. 

• ATLAS online monitoring and calibration system. 
This application is related to ATLAS experiment at 
LHC but it does not run on the WLCG infrastructure. 

• Analysis of Maps of Cosmic Microwave Back- 
ground. 

• Visualization of Plasma in Fusion Reactors. This 
application runs also in less interactive mode on the 
EGEE infrastructure. 

V Training, Demonstration and Diffusion 
Activities 

Being a global- scale initiative with large investment and 
social impact, computing Grid needs associated actions 
attracting and training users, and explaining the newest 
Grid technology to wider public. 

Training is provided by organizing courses, normally 
given by staff members of academic partner institutes of 
large Grid projects (cf. e.g. refs [13], [23]), and using 
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Fig. 11. Tickets flow scheme. Tickets issued by SET failures are put 
onto GGUS and automatically directed to Local Support units and solved 
there. End-users' tickets can be either put to GGUS or to Local Support 
units. These tickets are solved either by local staff or by supporters from 
dedicated support groups, depending on the type of the problem. Daily 
ticket handling is performed by TPM group. Solved tickets are stored 
in a database. 



dedicated training infrastructure. Courses are attended by 
students of universities at the engineering, M.Sc. and 
Ph.D. levels, practicing scientists of informatics, natural 
sciences and engineering, developers and managerial staff 
from commercial companies. 

Dissemination of Grid technology is assured by its ac- 
tive promotion in communities of the actual and potential 
users by using press news and specialized publications, 
participating in conferences of possibly wide spectrum of 
subjects, and through media [23], [24]. The on-demand 
TV, being nowadays a distributor of knowledge, may 
shortly become Grid's customer exploiting its computing 
power for own programme casting and production. 

VI. Outlook and Perspectives 

The WLCG is an example of well developed Grid 
infrastructure for science. To large extent, however, it 
was designed for specific needs of experimental particle 
physics where high-throughput, massive, asynchronous 
data-intensive processing of segmented data is needed. 
Parallel processing and using Massage Passing Interface 
software is rather rare. Occasional usage of such gateway 
services lilce application resource catalogs or worlcflow 
engines is not a common practice. Workflow management 
is quite often done semi-manually and resource brokerage 
is still far from being adaptive and autonomous. Prospec- 
tive line of development guides toward SOA where appli- 
cations are distributed over the network and are accessible 
from everywhere as services. Data are going to be virtu- 
alized (dCache) and workflows are dynamically designed 
and redesigned according to needs (cf. e.g. PGrade portal 
[25]). 

After fulfilling LHC commitments, Polish Tier-2 should 
evolve towards new computational paradigm where com- 
plex reserach scenarios are executed in response to ex- 
ternal events (e.g. rapid weather change) in closed loops 
with instruments and humans (interactivity). On-demand 
allocation of computing resources should ensure solving 
identified important problems. 

From commercial perspective, other aspects of Grid 
should be underlined. For pure research, robustness and 
security for applications is not really critical unless fa- 
cilities are being built. Scientific groups gladly relegate 
operations to commercial entities. But this practice often 
results undesirably for science because business itself is 
only interested in research in case of visible income. In- 
teresting game between two aspects of distributed, large- 
scale computing: the economic and the research, is in 
front of us. 
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